Skip to content

Conversation

@richiejp
Copy link
Owner

Eventually Assistant Mode will allow you to control your desktop with voice using natural speech and also allow a VLM to describe what is on the desktop. We can use MCP servers (tool calls) or a VLM which will be able to locate the coordinates of items on the desktop and click them.

Initially though this PR will just allow you to speak with an LLM using audio both ways over the OpenAI realtime API.

For LocalAI support this requires mudler/LocalAI#6245 which will implement the conversational parts of the API before we move onto tool calls and multi-modal support needed for a full desktop assistant.

@richiejp richiejp force-pushed the feat/assistant-mode branch from 7d35423 to 0d0da49 Compare January 7, 2026 14:31
@richiejp richiejp changed the title Assistant mode Assistant mode: voice only Jan 7, 2026
@richiejp richiejp merged commit 6dd5344 into main Jan 7, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants